93 research outputs found

    Adding New Tasks to a Single Network with Weight Transformations using Binary Masks

    Full text link
    Visual recognition algorithms are required today to exhibit adaptive abilities. Given a deep model trained on a specific, given task, it would be highly desirable to be able to adapt incrementally to new tasks, preserving scalability as the number of new tasks increases, while at the same time avoiding catastrophic forgetting issues. Recent work has shown that masking the internal weights of a given original conv-net through learned binary variables is a promising strategy. We build upon this intuition and take into account more elaborated affine transformations of the convolutional weights that include learned binary masks. We show that with our generalization it is possible to achieve significantly higher levels of adaptation to new tasks, enabling the approach to compete with fine tuning strategies by requiring slightly more than 1 bit per network parameter per additional task. Experiments on two popular benchmarks showcase the power of our approach, that achieves the new state of the art on the Visual Decathlon Challenge

    Deep Shape Matching

    Full text link
    We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.Comment: ECCV 201

    Aligning Salient Objects to Queries: A Multi-modal and Multi-object Image Retrieval Framework

    Get PDF
    This is the author accepted manuscript. The final version is available from Springer Verlag via the DOI in this recordACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2-6 December 2018In this paper we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sketches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model learned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every dataset.European Union Horizon 2020CERCA Program of Generalitat de Cataluny

    Inner Space Preserving Generative Pose Machine

    Full text link
    Image-based generative methods, such as generative adversarial networks (GANs) have already been able to generate realistic images with much context control, specially when they are conditioned. However, most successful frameworks share a common procedure which performs an image-to-image translation with pose of figures in the image untouched. When the objective is reposing a figure in an image while preserving the rest of the image, the state-of-the-art mainly assumes a single rigid body with simple background and limited pose shift, which can hardly be extended to the images under normal settings. In this paper, we introduce an image "inner space" preserving model that assigns an interpretable low-dimensional pose descriptor (LDPD) to an articulated figure in the image. Figure reposing is then generated by passing the LDPD and the original image through multi-stage augmented hourglass networks in a conditional GAN structure, called inner space preserving generative pose machine (ISP-GPM). We evaluated ISP-GPM on reposing human figures, which are highly articulated with versatile variations. Test of a state-of-the-art pose estimator on our reposed dataset gave an accuracy over 80% on PCK0.5 metric. The results also elucidated that our ISP-GPM is able to preserve the background with high accuracy while reasonably recovering the area blocked by the figure to be reposed.Comment: http://www.northeastern.edu/ostadabbas/2018/07/23/inner-space-preserving-generative-pose-machine

    Free-hand sketch synthesis with deformable stroke models

    Get PDF
    We present a generative model which can automatically summarize the stroke composition of free-hand sketches of a given category. When our model is fit to a collection of sketches with similar poses, it discovers and learns the structure and appearance of a set of coherent parts, with each part represented by a group of strokes. It represents both consistent (topology) as well as diverse aspects (structure and appearance variations) of each sketch category. Key to the success of our model are important insights learned from a comprehensive study performed on human stroke data. By fitting this model to images, we are able to synthesize visually similar and pleasant free-hand sketches

    Transferring Neural Representations for Low-dimensional Indexing of Maya Hieroglyphic Art

    Get PDF
    We analyze the performance of deep neural architectures for extracting shape representations of binary images, and for generating low-dimensional representations of them. In particular, we focus on indexing binary images exhibiting compounds of Maya hieroglyphic signs, referred to as glyph-blocks, which constitute a very challenging dataset of arts given their visual complexity and large stylistic variety. More precisely, we demonstrate empirically that intermediate outputs of convolutional neural networks can be used as representations for complex shapes, even when their parameters are trained on gray-scale images, and that these representations can be more robust than traditional handcrafted features. We also show that it is possible to compress such representations up to only three dimensions without harming much of their discriminative structure, such that effective visualization of Maya hieroglyphs can be rendered for subsequent epigraphic analysis

    Identifying core MRI sequences for reliable automatic brain metastasis segmentation

    Full text link
    BACKGROUND Many automatic approaches to brain tumor segmentation employ multiple magnetic resonance imaging (MRI) sequences. The goal of this project was to compare different combinations of input sequences to determine which MRI sequences are needed for effective automated brain metastasis (BM) segmentation. METHODS We analyzed preoperative imaging (T1-weighted sequence ± contrast-enhancement (T1/T1-CE), T2-weighted sequence (T2), and T2 fluid-attenuated inversion recovery (T2-FLAIR) sequence) from 339 patients with BMs from seven centers. A baseline 3D U-Net with all four sequences and six U-Nets with plausible sequence combinations (T1-CE, T1, T2-FLAIR, T1-CE + T2-FLAIR, T1-CE + T1 + T2-FLAIR, T1-CE + T1) were trained on 239 patients from two centers and subsequently tested on an external cohort of 100 patients from five centers. RESULTS The model based on T1-CE alone achieved the best segmentation performance for BM segmentation with a median Dice similarity coefficient (DSC) of 0.96. Models trained without T1-CE performed worse (T1-only: DSC = 0.70 and T2-FLAIR-only: DSC = 0.73). For edema segmentation, models that included both T1-CE and T2-FLAIR performed best (DSC = 0.93), while the remaining four models without simultaneous inclusion of these both sequences reached a median DSC of 0.81-0.89. CONCLUSIONS A T1-CE-only protocol suffices for the segmentation of BMs. The combination of T1-CE and T2-FLAIR is important for edema segmentation. Missing either T1-CE or T2-FLAIR decreases performance. These findings may improve imaging routines by omitting unnecessary sequences, thus allowing for faster procedures in daily clinical practice while enabling optimal neural network-based target definitions

    Fast character modeling with sketch-based PDE surfaces

    Get PDF
    © 2020, The Author(s). Virtual characters are 3D geometric models of characters. They have a lot of applications in multimedia. In this paper, we propose a new physics-based deformation method and efficient character modelling framework for creation of detailed 3D virtual character models. Our proposed physics-based deformation method uses PDE surfaces. Here PDE is the abbreviation of Partial Differential Equation, and PDE surfaces are defined as sculpting force-driven shape representations of interpolation surfaces. Interpolation surfaces are obtained by interpolating key cross-section profile curves and the sculpting force-driven shape representation uses an analytical solution to a vector-valued partial differential equation involving sculpting forces to quickly obtain deformed shapes. Our proposed character modelling framework consists of global modeling and local modeling. The global modeling is also called model building, which is a process of creating a whole character model quickly with sketch-guided and template-based modeling techniques. The local modeling produces local details efficiently to improve the realism of the created character model with four shape manipulation techniques. The sketch-guided global modeling generates a character model from three different levels of sketched profile curves called primary, secondary and key cross-section curves in three orthographic views. The template-based global modeling obtains a new character model by deforming a template model to match the three different levels of profile curves. Four shape manipulation techniques for local modeling are investigated and integrated into the new modelling framework. They include: partial differential equation-based shape manipulation, generalized elliptic curve-driven shape manipulation, sketch assisted shape manipulation, and template-based shape manipulation. These new local modeling techniques have both global and local shape control functions and are efficient in local shape manipulation. The final character models are represented with a collection of surfaces, which are modeled with two types of geometric entities: generalized elliptic curves (GECs) and partial differential equation-based surfaces. Our experiments indicate that the proposed modeling approach can build detailed and realistic character models easily and quickly

    Sketch-a-Net: A Deep Neural Network that Beats Humans

    Get PDF
    This Project received support from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement #640891, and the Royal Society and Natural Science Foundation of China (NSFC) Joint Grant #IE141387 and #61511130081. We gratefully acknowledge the support of NVIDIA Corporation for the donation of the GPUs used for this research
    • …
    corecore